Telematics data essentially comes from the telematics system found in a vehicle’s tracking equipment. It records data that is being sent, received or stored by the tracking system. More specifically, telematics data include data on speed, precise location (addresses or coordinates) and time, route undertaken, safety metrics, acceleration, fuel intake, and many more. Telematics data can also include multiple datasets that each consists of a subset of variables. Such an example will be explained below using the data used to create this report.
But why should one care about telematics data? Well, telematics data is important for a myriad of reasons. For one, vehicles, especially cars, are not going anywhere anytime soon. Therefore, any data collected on their performance can help identify potential ways in which not only vehicles can operate more efficiently but also how the transport system can be more efficient. This can help reduce traffic congestion as well as fuel consumption. With clean energy and the energy transition becoming a priority for many key stakeholders from policymakers to drivers, it is more important now than ever to create more sustainable transportation systems. Telematics data could definitely help in pushing this endeavor forward.
Another crucial element of using telematics data is to find patterns in terms of safety on roads. Pinning down potential unsafe routes or even unsafe times can inform authorities and policymakers to ensure the safety of everyone. From a technological point of view, the more telematics data gathered, the more effective existing tracking systems as well as GPS systems can be. With AI technology being on the rise, it is only a matter of time that AI is implemented in vehicle systems, including tracking devices. Telematics data could be a potential source of data that could inform structure and build these new AI models to make traveling safer and faster.
The automotive industry uses telematics data. Insurance companies use it as a means to monitor a driver’s behavior. It can record harsh events and measure safety scores. Ride share apps use it for vehicle tracking. Car companies can also use it for maintenance improvements. It would measure when maintenance is needed based on engine hours, miles driven and more. Currently it is being used for fleet management, as in managing a number of vehicles to run on time, within budget and at maximum efficiency.
We were provided with eight microtransit tables that contain data on organization associated with the ride, trip requests made by riders, general route information together with harsh events and safety details, and vehicle information with diagnostics and locations. For the purpose of our analysis, we mainly focused on trip requests and vehicle location tables.
Both the trip request table and the vehicle location table contains data spanning from December 2020 to February 2023. The trip request table contains general information about a requested trip including its scheduled and requested time, pick up and drop off time and locations, the completion status, estimated miles and fare. The vehicle location table contains real-time location information for full trips– tracking of where the vehicle was including latitude and longitude as well as its speed during the trip.
#R version 4.3.1 (2023-06-16)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: Ubuntu 20.04.1 LTS
#Used remote Smith RStudio
library(tidyverse) #tidyverse_2.0.0
library(tidycensus) #tidycensus_1.5
library(tidygeocoder) #tidygeocoder_1.0.5
library(sf) #sf_1.0-14
library(stringr) #stringr_1.5.0
library(plotly) #plotly_4.10.2
library(sfheaders) #sfheaders_0.4.3
library(leaflet) #leaflet_2.1.2
library(leaflet.extras2) #leaflet.extras2_1.2.2
library(htmlwidgets) #htmlwidgets_1.6.2
Trip_request <- read.csv("data/TRIP_REQUEST_202308141518.csv")
vehicle_location_data <- read_csv("data/VEHICLE_LOCATION_202308141525.csv")
## coordinates for ALL pick up and drop off to make coordinates
trip_data_full <- Trip_request %>%
geocode(DROPOFF_ADDRESS, method = 'census', lat = lat_drop, long= long_drop) %>%
geocode(PICKUP_ADDRESS, method = 'census', lat = lat_pickup, long= long_pickup) %>%
drop_na()
# variables for 2020 and 2022 census data
var_acs20 <- load_variables(2020, "acs5", cache = TRUE)
var_acs21 <- load_variables(2021, "acs1", cache = TRUE)
var_acs22 <- load_variables(2022, "acs1", cache = TRUE)
#getting median income data from acs for 2021
median_income <- get_acs(geography = "county subdivision",
variables = c(medincome = "B19013_001"),
state = "OH",
year = 2021,
geometry = TRUE) %>%
drop_na()
#getting public transport data from census for 2021
public_transport <- get_acs(geography = "county subdivision",
variables = c(public_route = "B08006_008"),
state = "OH",
year = 2021,
geometry = TRUE) %>%
drop_na()
# finding median income levels in 2020 in ohio
income_ohio2020 <-
get_acs(geography = "county",
state = "OH",
variables = c(medincome = "B19013_001"), # median household income
# "B01003_001"), # total pop
year = 2020,
geometry = TRUE)%>%
rename(median_household_income20 = estimate)%>%
rename(margin_of_error = moe) %>%
mutate(margin_of_error= coalesce(margin_of_error, 0)) %>%
select(GEOID, NAME,median_household_income20,geometry)
# finding poverty levels in 2020 in ohio
povertyrates_ohio2020 <-
get_acs(geography = "state",
#persons whose income in last12 months was below poverty level
variables = "B17001_002",
# person whose poverty level is tracked
summary_var = 'B17001_001',
# state = "OH",
geometry = TRUE,
year = 2020) %>%
rename(population = summary_est) %>%
filter(population>0)%>%
mutate(pov_rate = estimate/population) %>%
mutate(pov_rate = pov_rate*100) %>%
select(NAME, population, pov_rate)
Visualizing telematics data can be a challenging task. Besides encountering the above problems in loading and cleaning the data, understanding the tables to be able to figure out what variables we want to plot has proven to be difficult as well. It is easy to plot variables to get a map but creating a meaningful map that can be part of a larger storyline is easier said than done.
Given that the data dictionary for the dataset provided were not very comprehensive, our team had to make assumptions when interpreting the different variables and the unit of observations. Even when filtering the data for only Ohio datapoints, some coordinates outside of these parameters managed to slip through. We suspect it has something to do with how the data was collected and stored rather than the code used to filter the tables
Nevertheless, we were able to come up with multiple data visualizations, both static and interactive, to display specific data points in a purposeful manner.
#creating dataset for one day from vehicle location dataset
vehicle_location_data_one_day <- vehicle_location_data %>%
filter(str_detect(EVENT_TIMESTAMP, "2021-08-03"))
#splitting EVENT_TIMESTAMP into day and time for further analysis
vehicle_location_data_one_day[c("date", "time")] <- str_split_fixed(vehicle_location_data_one_day$EVENT_TIMESTAMP, ' ', 2)
vehicle_location_data_one_day$month <- format(as.Date(vehicle_location_data_one_day$date, format="%Y-%m-%d"),"%m")
vehicle_location_data_one_day$year <- format(as.Date(vehicle_location_data_one_day$date, format="%Y-%m-%d"),"%Y")
vehicle_location_data_one_day$day <- format(as.Date(vehicle_location_data_one_day$date, format="%Y-%m-%d"),"%d")
# creating color palette to map census data
pal <- colorNumeric(
palette = "magma",
domain = median_income$estimate,
reverse = TRUE
)
pal(c(10, 20, 30, 40, 50))
## [1] "#808080" "#808080" "#808080" "#808080" "#808080"
#clustering all trips for one day
map_1 <- leaflet() %>%
#adding OpenStreetMap
addTiles() %>%
#adding census data on income as backdrop
addPolygons(data = median_income,
color = ~pal(estimate),
weight = 0.5,
smoothFactor = 0.2,
fillOpacity = 0.5,
label = ~estimate) %>%
addLegend(
position = "bottomright",
pal = pal,
values = median_income$estimate,
title = "Median Income ($)"
) %>%
#adding datapoints for all trips on one day
addMarkers(
data = vehicle_location_data_one_day,
lng = ~LNG,
lat = ~LAT,
label = ~VEHICLE_ID,
popup = ~VEHICLE_ID,
clusterOptions = markerClusterOptions()
)
map_1
By zooming in and out, the viewer can see the areas drivers took their passengers through on that particular day. The trips are mainly concentrated in Columbus and northern Ohio. This plot is meant to show at a first glance the concentration of trips in Ohio.
#creating dataset for one vehicle id for one day
vehicle_location_data_id_71 <- vehicle_location_data_one_day %>%
filter(VEHICLE_ID == 71) %>%
filter(str_detect(EVENT_TIMESTAMP, "2021-08-03"))
#mapping route for one vehicle id for one day
map_2 <- leaflet() %>%
#adding OpenStreetMap
addTiles() %>%
#adding census data on income as backdrop
addPolygons(data = median_income,
color = ~pal(estimate),
weight = 0.5,
smoothFactor = 0.2,
fillOpacity = 0.5,
label = ~estimate) %>%
addLegend(
position = "bottomright",
pal = pal,
values = median_income$estimate,
title = "Median Income ($)"
) %>%
#limit to first 3000 rows to decrease processing time
addPolylines(data = head(vehicle_location_data_id_71, 3000), lng = ~LNG, lat = ~LAT, group = ~VEHICLE_ID)
#leaflet.extras2::addPlayback(data = head(vehicle_location_data_id_71, 3000),
#time = "time",
#options = leaflet.extras2::playbackOptions(speed = 0.5))
map_2
triprequest_map <- leaflet(trip_data_full) %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lat = ~lat_pickup, lng = ~long_pickup, clusterOptions = markerClusterOptions()
)
triprequest_map